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Abstract. Several classes of MCMC methods for the numerical solution of Bayesian 
Inverse Problems for partial differential equations (PDEs) with unknown random 
field coefficients arc considered. A general framework for their numerical analysis 
is presented. The complexity of MCMC sampling for the unknown fields from the 
posterior density, as well as the convergence of the discretization error of the PDE 
of interest in the forward response map, is analyzed. Particular attention is given 
to bounds on the overall work required by the MCMC algorithms for achieving a 
prescribed error level e. We show that the computational complexity of straightforward 
combinations of MCMC sampling strategies with standard PDE solution methods 
is generally excessive. Two computational strategics for substantially reducing the 
computational complexity of MCMC methods for Bayesian inverse problems prising 
in PDEs are studied: a parametric, deterministic gpc-type (generalized polynomial 
chaos) representation of the forward solution map of the PDE with uncertain 
coefficients, which has been proposed and implemented in the engineering literature 
(e.g. jTTl HH [16] ) ; and a new Multi-Level Monte Carlo sampling strategy of the Markov 
Chain (MLMCMC) with sampling from a multilevel discretization of the posterior 
and a multilevel discretization of the forward PDE. We compare the computational 
complexity of these gpc-MCMC and MLMCMC methods to that of the plain MCMC 
method, and provide sufficient conditions on the regularity of the unknown coefficient 
for both, the gpc-MCMC and MLMCMC method, to afford substantial complexity 
reductions over the plain MCMC approach. 
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1. Introduction 

Many inverse problems arising in differential equations require determination of 
unknown parameters u from finite dimensional data S which we assume to be related by 

5 = g(u) + $. (i) 

Here u, which we assume to belong to a function space, is an unknown input to a 
differential equation and Q is the "forward" mapping taking one instance of the input 
u into a usually finite and noisy set of observations. We model these observations 
mathematically as continuous linear functionals on the solution of the governing partial 
differential equation. In (JTJ), the parameter $ represents noise arising when observing 
and we assume that this is a single realization of a centred Gaussian N(0, E). A Bayesian 
formulation of the inverse problem then leads to the problem of probing the probability 
measure p s given by 

^(^ocexpH&M)), (2) 

where 

<S>(u;5)= 1 -\5-g(u)\l (3) 

and | • |s = • | with | ■ | the Euclidean norm, and where p is a prescribed prior 

probability measure. 

The purpose of this paper is to analyze the computational complexity of several 
Markov-Chain Monte-Carlo (MCMC) approaches to probing the distribution p s . These 
methods will incur two principal sources of error. First, the sampling error arising from 
estimating expectations conditional on given data S by sample averages of M realizations 
of the unknown u, drawn from p 5 . The error in doing so will decay as M~a as the number 
M of draws of u tends to oo. Second, the discretization error arising from approximation 
of the system response for each draw of u, i.e. the error of approximating Q{u). For 
expository purposes, and to cover a wide range of discretization techniques, we assume 
the discretization error to decay as N~^, where iVdof is the total number of degrees of 
freedom[j] and a > 0; and we assume that the work per step scales as N^ o{ as iVdof — > oo 
for some power b > so that the total work necessary for M draws in the MCMC scales 
as MN\ o{ . If (as we show in the present paper) the constant in the mean square MCMC 
error bound of order 0(M~^) is independent of A^of, then a straightforward calculation 
shows that the work to obtain error e will grow asymptotically, as e — > 0, as e~ 2 ~ h l a . 
The ratio b/a is thus crucial to the overall computational complexity of the algorithm. 

In this paper, we develop three ideas to speed up MCMC-based algorithms for 
Bayesian inversion in systems governed by partial differential equations. The first 
idea, which underlies the preceding expository calculation concerning complexity, is 
that MCMC methods can be constructed whose convergence rate is independent of the 
number of degrees of freedom used in the approximation of the forward map; the 

| logarithmic corrections also occur, and will be detailed explicitly in later sections 
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key idea here is to use MCMC algorithms which are defined on function spaces, as 
overviewed in [23] and to use Galerkin discretizations of the forward map which employ 
Riesz bases in these function spaces. The second idea is that sparse, deterministic 
parametric representations of the forward mapu H- Q(u), as in [3H2TJII], can significantly 
reduce b/a and allow, therefore, for reducing computational complexity, as the sparse 
approximation of Q can be precomputed prior to running the Markov chain, thereby 
decreasing the constant b [T71 [151 US]- The third idea is that the sparse representation 
of the forward map can be truncated adaptively at different discretization levels of the 
physical system of interest. Then, we propose a multilevel Monte Carlo acceleration of 
the MCMC method in the spirit of [9] and prove that this allows further improvement 
of the computational complexity. 

The paper is organized as follows. In Section [2] we discuss the Bayesian formulation 
of inverse problems and describe and analyze an MCMC method which is defined on 
metric spaces. Section [3] is devoted to the specific elliptic inverse problem which we study 
for illustrative purposes. In section H] we study what we term the standard MCMC 
method where the forward map £?(■) is computed at each step of the Markov chain. 
Section [5] is concerned with showing that the complexity of MCMC can be reduced if 
we precompute a parametric representation of the forward map Q prior to running the 
Markov chain, and simply evaluate it at each step of the chain, leading to the improved 
MCMC method. Finally, in Section El we combine this idea to further improve the 
efficiency of the MCMC algorithm. To this end, we employ the hierarchic nature of the 
gpc-Finite Element Galerkin discretizations. We combine discretizations on multiple 
levels I = 0, 1,...,L and combine these judiciously with a level dependent sample size 
M(_. We show that this leads to a multilevel MCMC-gpcFE method that can significantly 
improve the overall algorithmic complexity under certain assumptions. 

We will use standard notation: B k denotes the sigma algebra of Borel subsets of 
]R fc . For a probability space (Q,A,p) consisting of the set Q of elementary events, a 
sigma algebra A and a probability measure p, and a separable Hilbert space H with 
norm || • ||# and for a summability exponent < p < oo we denote by L p (Q,p; H) the 
Bochner space of strongly measurable mappings from Q to H which are p-summable. 

Because of the various discretizations employed, and in particular the multi-level 
structure of some of these, it will be helpful to the reader to have a clear overview of the 
notation used to describe the range of probability spaces which arise, both through the 
Bayesian formulation of the inverse problem, and the Markov chains used to probe 
it. We now give such an overview. Throughout the paper, we denote by E M the 
expectation with respect to a probability measure /i on the subspace U containing the 
unknown function u. In the following we will finite-dimensionalize both the subspace 
U, in which the unknown function u lies, and the space containing the response of 
the forward model. The parameter J denotes the truncation level of the coefficient 
expansion (fT5l) used for the unknown function, and the parameter I denotes the spatial 
finite element discretization level introduced in section HI The parameters N and C 
denote the cardinality of the set of the chosen active gpc coefficients and the set of 
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finite element discretization levels for these coefficients in section The measure /x will 
variously be chosen as the prior p, the posterior p s , and various approximations of the 
posterior such as p J> '' 5 . 

We denote by V u (o), V J f 0) and V N fa probability measures on the probability space 
generated by the MCMC processes detailed in the following, when conditioned on the 
initial point u^°\ The acceptance probability for the Metropolis-Hastings Markov chain 
is defined as a in ([6]), a J ' 1 in (13 ip . and a N ' c in (jHJ) for the problems on the full 
infinite dimensional space and its truncations. We then denote by E u (p), £ ( ) an d £„(o) 
expectation with respect to V u (o) , V^f 0) and V N (^ respectively. 

If the initial point of these Markov chains is distributed with respect to an 
initial probability measure \i on U, then we denote the probability measure on the space 
that describes these Markov chains by V^, V^ ,J ' 1 and 7-"^' , and the corresponding 
expectation accordingly by £ M , £^> J ' 1 and ' . 

Finally, in Section [61 we will work with the probability measure on the space 
that generates a sequence of uncorrelated Markov chains created by the multilevel- 
MCMC procedure, and with <£ L , the expectation with respect to this probability 
measure. The definition of these measures will be given at the beginning of Section 

El 



2. Bayesian inverse problems in measure spaces 

On a measurable space (U, O) where is a a-algebra consider a measurable map 
Q : U — > (R k ,B k ). The data 5 is assumed to be an observation of Q subject to an 
unbiased observation noise 

5 = g{ u ) + i9. 

We assume that d is a centred Gaussian with law iV(0, S). Let p be a prior probability 
measure on (U, 0). Our purpose is to determine the conditional probability P(n|5) on 
(U, 0). The following result holds. 

Proposition 1 Assume that Q : U — > M fc is measurable. The posterior measure 
p 6 (du) = F(du\S) given data 5 is absolutely continuous with respect to the prior measure 
p(du) and has the Radon-Nikodym derivative $M) with $ given by 

This result is established in Cotter et al.[8] and Stuart [23]. Though the setting in [8] 
and [23] is in a Banach space X, the proofs of Theorem 2.1 in [8] and Theorem 6.31 of 
[23] hold for any measurable spaces as long as the mapping Q is measurable. 

To study the well-posedness of the posterior measures, that is continuity with 
respect to changes in the observed data, we use the Hellinger distance, as in Cotter 
et al. [Sj; see below for the definition. In that paper it is proved that when U is a 
Banach space, if the prior measure p is Gaussian, and under the conditions that $ 
grows polynomially with respect to u, and is locally Lipschitz with respect to u fixing 
y and with respect to y fixing u, in the second case with Lipschitz constant which also 
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grows polynomially in u, then the posterior measure given the data 8, i.e. p s , is locally 
Lipschitz in the Hellinger distance <iHcii : 

cWpV) <c\6-6'\ , 

where (recall) | • | denotes the Euclidean distance in The Fernique theorem plays an 
essential role in the proofs, exploiting the fact that polynomially growing functions are 
integrable with respect to Gaussians. In this section, we extend this result to measurable 
spaces under more general conditions than those in Assumption 2.4 of Cotter et al. [8]; 
in particular we do not assume a Gaussian prior. We make the following assumption 
concerning the local boundedness, and locally Lipschitz dependence of $ on 8. 

Assumption 2 Let p be a probability measure on the measure space (U,Q). The 
function $ : U xK 1 -)! satisfies: 

(i) for each r > 0, there is a constant M(r) > and a set U(r) C U of positive p 
measure such that for all u G U(r) and for all 8 such that \8\ < r 

< $0;<T) < M(r); 

(ii) there is a mapping G : Ex U h-» E such that for each r > 0, G(r, •) G L 2 (U, p); and 
for every \8\, \5'\ < r holds 

\<$>{u)5)-${u-8')\ <G(r,u)\8-8'\ . 

Under Assumption [2} the definition (J2J) of the posterior measure p 6 is meaningful as we 
now demonstrate. 

Proposition 3 Under Assumption dj, the measure p s depends locally Lipschitz 
continuously on the data 8 with respect to the Hellinger metric: for each positive constant 
r there is a positive constant C{r) such that if \8\, \8'\ < r, then 

duen(p S ,p 5 ')<C(r)\8-8'\. 

Proof Throughout this proof K(r) denotes a constant depending on r, possibly changing 
from instance to instance. The normalization constant in (J2J) is 

Z{8)= [ exp(-$(u; 8))dp{u) . (4) 
Ju 

We first show that for each r > 0, there is a positive constant K(r) such that 
Z{8) > K(r) when \8\ < r. From (j4]) and Assumption [2]^i) it follows that, that when 
\5\<r, 

Z(8) > p(U(r)) exp(-M(r)) > . (5) 
Using the inequality | exp(— x) — exp(— y)\ < \x — y\ which holds for x, y > we find 

\Z(8) -Z{8')\ < [ \${u;8) - $(u; 8')\dp{u) . 
Ju 

From Assumption Etli), 

|$(«;<J)-$(u;<?')| <G(r,u)\8-8'\ . 
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As G(r, u) is p-integrable, there is K{r) such that 

\Z{5) -Z{5')\ < K(r)\8-8'\ . 
The Hellinger distance satisfies 

2cWpV? = ^(z(5)- 1 / 2 exp(-^( U ;5)) - Z{8')' 1/2 exp -i$(u; 5')) 2 ^(«) 



'(7 

< /l + / 2 , 



where 



and 



2 /" / 1 1 \ 2 

h = Zj6) J^M-^iu-^)) - exp(--^(u;5'))) dp{u) 



h = 2\Z(5y 1/2 - Z(5')- 1/2 \ 2 / exp(-$(u;6'))dp(u). 

Ju 

Using again | exp(— x) — exp(— y)\ < \x — y\, we have, for constant K(r) > 0, 
h < K{r) [ |$(u; tf) - 5')| 2 dp(w) 

< / (G(r,u)) 2 dp(u)|5-5'| 2 < A(r)|5-5'| 2 . 
Jx 

Furthermore, 

\Z(5y 1/2 - Z(5T 1/2 \ 2 < K{r)\Z{8) - Z(5')\ 2 < K{r)\8 - 8'\ 2 . 

The conclusion follows. □ 
We now introduce a Metropolis-Hastings MCMC method designed to be reversible 
and ergodic with respect to the posterior measure p s : to this end, given data 8, we 
define for any u,v EU 

a(u, v) — 1 A exp($(u, 8) - 8)) . (6) 

The Markov chain {u^ k '}'^L 1 C U is then constructed as follows: given the current state 
we draw a proposal independently of vS k > from the prior measure p appearing in 
02]). Let {w ^}fc>i denote an i.i.d sequence with W[0, 1] and with independent 

of both and v^ k '. We then determine the next state -u( fc+1 ) via the formula 

= l(a(/),t;W) > w^v™ + (l - l(a(n«,/') > .(7) 

Thus we choose to move from to with probability a{u^ k \ v w), and to remain 
at with probability 1 — a(itW, ^w). We claim that ([7]) defines a Markov chain 
| M (fc)joo^^ which is reversible with respect to p s . To see this let u(du,dv) denote the 
product measure p s (du) <S> p(dv) and u^(du,dv) = u(dv,du). Note that v describes the 
probability distribution of the pair onU xU when is drawn from p" 5 , and 

i/t designates the same measure with the roles of u and u reversed. These two measures 
are equivalent (as measures) if p 5 and p are equivalent, and then 

dv^ 

— (u,u)=exp($(u; <*)-$(«;£)) , (u,v) E U xU . (8) 
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From Proposition 1 and Theorem 2 in [21] we deduce that (J2J) is a Metropolis- 
Hastings Markov chain which is p 5 reversible, since a(u,v) given by (j6]) is equal to 
min{l, ^rp(u, v)}. Since is chosen independently of the current state the Markov 
chain is, in fact, an independence sampler. We now give sufficient conditions which 
render the Markov chain ergodic. 

Theorem 4 Let Assumption^ hold with U{r) = U. Then p s is equivalent to p so that 
the Markov chain |7p is well-defined and reversible with respect to p s . Ifp(u, A) denotes 
the transition kernel for the Markov chain, and p n (u,A) its n th iterate, then for all 
n <EN, 

\\p n (u, A) - p 5 \\ TY < 2(1 - exp(-M (r))) n . 
For every bounded, continuous function g : U — > M., there holds, V u (o) almost surely, 
1 M 

-^^(^) = E^[^H] + c6,M-i (9) 

k=l 

where £m is a sequence of random variables which converges weakly as M — )■ oo to 
£ ~ N(0, 1) and where c is a positive constant which depends only on M(r) and on 
sup ng [/ Furthermore, we have the mean square error bound 

M 

,9 

k=l 

Proof Equivalence of p s and p follows since the negative of the log-density is bounded 
from above and below, uniformly on U: 

< $(u) < M(r) VueU . (10) 

Using these bounds and ([6]) it also follows that the proposed random draw from p has 
probability greater than exp(— M(r)) of being accepted. Thus 

p(u, A) > exp(-M(r))p(A) Vu e U. 



1 



1/2 

< CM~ 1/2 



The first result follows from [18], Theorem 16.2.4 with X — U. The second result follows 
from [18], Theorem 17.0.1. To see that c in (J9l) is bounded by a constant that depends 
only on M(r) and sup uG[/ ^(w)!, we note that it is given by 

oo 

c 2 = £'\g{vP>)\* + 2j2£ pS [9(^)g(u^)} (11) 

n=l 

where the function g is defined as g = g — K pS (g). Now 

oo oo 

2 J2 & &(« (0) )S(« (b) )] < 2 sup \g{u) \W> £ \S u{0) [g(u^)} \ 



n=0 n=0 



< 2 sup |^)|E" 4 1^(0) \9{u^)} - W S [g}\ 

u „ 
n=0 

oo 

< 4 sup \g(u)\ 2 ]T(1 - exp(-M(r))) n . 

u _ 
n=0 
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For the mean square approximation, using the stationarity of the Markov chain 
conditioned to start in U 3 ~ p s , we have 



M 



M M 



k=l 



k=l j=k+l 
M-l M-k 

k=0 j=l 
M-l M-k 



< E' [g(u 



M-l 



k=0 j=l 



M-k 



+ 2 h E su p E ^o^^c^)] - e// mi] 

fc=o " i=i 

1 M-l M-fe 

< E^(« {0) ) 2 ] + 4- £ sup |sW| 2 £ (1 - exp(-M(r))) J 



fc=0 



< E^[^(n(°)) 2 ] +4su P |^(m)| 2 ^(1 - exp(-M(r))) j , 

j=i 

which is clearly bounded uniformly with respect to M. Thus we have shown that there 
exists C > such that 

M 



M 



E 

k=l 



g[u 



(ky 



< 



C 
M 



It remains to show that the expectation with respect to the unknown posterior p s can 
be replaced by an expectation with respect to the prior measure p. 
To this end we note that 



M 



£ p E^ (fe) : 



k=l 



M 



s, 



((» 



E^ (fch 



k=l 

M 



/ £ u(°) E 
Ju Ll k=i 



g(uW) 



dp{u 



dp 



(oh 



^ (0 W(- (o: 



M 



fc=i 



sup exp($(u;5)) 



As < 1 and $(•; 5) is assumed to be bounded uniformly, we deduce that 

M 

8 P 



k=l 



21 C 
< . 

- M' 



for a constant C independent of M. The conclusion then follows. 



□ 



Remark 5 A key observation in the previous theorem is that all constants depend only 
on M(r) and on the supremum of g. Hence, if we can show for finite element and 
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Karhunen-Loeve approximations of our forward map Q in the physical domain D that 
we obtain uniform upper bounds on our approximation o/$, then the conclusions of the 
preceding theorem will hold with constants that are uniformly bounded with respect to 
the approximation parameters. 

3. Model Elliptic Inverse Problem 

In the remainder of this paper, we develop the previously outlined abstract ideas for 
a specific class of model elliptic inverse problems where the unknown parameter is the 
diffusion coefficient and where each realization of the observed data comprises finitely 
many continuous, linear functionals of the forward solution. However we hasten to 
add that a similar analysis is possible for other PDE inverse problems. For example, 
for linear parabolic or second order hyperbolic PDEs, the analysis of the parametric 
forward problems, central to the approach developed herein, is available in [131 E]- 

We start by discussing the forward problem of elliptic PDEs with random 
coefficients, enabling the construction of a prior measure on an infinite dimensional 
space of unknown coefficients. We then show how this prior may be combined with 
properties of the forward solution map to obtain a well-defined inverse problem. 

3.1. A Class of Elliptic Problems With Random Coefficients 

Let D be a bounded Lipschitz domain in M d . For / G L 2 (D), we consider the elliptic 
problem 



Throughout we assume that the domain D is a convex polyhedron with plane sides. The 
coefficient K(x, u) is a random field from the probability space S, P) to L°°(D). We 
assume that the random coefficient K(x, u) can be represented by a sequence of pairwise 
uncorrelated independent random variables : Q — >- [ — 1, 1] in the series expansion 



Here, the sum is either finite or infinite. To render ( Tl3|) meaningful, we impose the 
following assumptions on K and ijjj. 

Assumption 6 The functions K and ipj in ijTS)) are in L°°(D) and there exists a 
positive constant k such that 



where K min = essinf x i^(x) > 0. 

It follows from this assumption that there exist finite positive constants K m [ n and K x 
such that for all u G Q 



V • (K(x, u)VP(x, u)) = f{x) in D, P = on dD . 



(12) 




(13) 




Kmin < K(X, U) < K max , Vx G D. 



(14) 
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In particular we may choose K min = K min /(1 + K) and K max = esssup x K(x) + nK rnin /(l + 
«). 

We denote by U = [—1, 1] N the set of all sequences u = (itj)i>i °f coordinates 
taking values in [—1, 1] and note that this is the unit ball in £°°(N). We equip the 
parameter domain U with the product sigma algebra = &)°° =1 B([—1,1]). On the 
measurable space (U, 9) thus obtained, we define the countable product probability 
measure 

where duj is the Lebesgue measure on [—1,1]. As Uj are uniformly distributed on 
[—1, 1], the measure p is the law of the random vector u = (ui, u 2 , . . .) in U. As random 
variables Uj{u) in the sequence u were assumed independent, the probability measure 
on realizations of random vectors u G U is a product measure: for S = Ylj>i 

p(S) = l[¥({uj: Uj eS ] }). 

For each u G U, we define the parametric, deterministic coefficient function 

K(x,u) = K(x) + ^2u j ip j (x) . (15) 

Due to Assumption [61 for any u £ U the series ( TT5|) converges in L°°(D). Therefore, for 
each u G f/, we consider the model parametric, deterministic diffusion problem in D 

-V ■ (K(x,u)VP(x,u)) = f(x) in D, P = on dD . (16) 

We let V = Hl(D), whilst V* denotes its' dual space. We equip V with the norm 
||-P||v — II VP||l2( D ). By ( fl3l) . K(x, u) is bounded below uniformly with respect to 
(x,u) £ D x U and we infer for every u G £7 

# m ijP(-,«)||^ = if mm (VP(-,n),VP(-,n)) <(/!-(-, u)VP(-,«),VP(-,«)) 



(/,P(-, M ))<^1||P(-, M ) 



*-min 

It follows that 



sup||P(-,n)|| y <Mb£l. (17) 



The solution P{-,u) of (TT6|) is the law of the solution P{-,oo) of the equation (TT21) 
as the following result shows. 

Proposition 7 Under Assumption {Si the solution P : U ^ V = Pq(P) is Lipschitz 
when viewed as a mapping from the unit ball in £°°(N) to V . It is in particular 
measurable, as a mapping from the measurable space (U,Q) to (V,B(V)). 

Proof From ( TToT) . we have for every <ft G V 



/ K(x, u)(VP{x, u) - VP(x, u')) ■ V<p{x)dx 
Jd 

= / {K{x,u') - K{x,u))VP{x,u') -V<p{x)dx . (18) 
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Again using ( I14p . i.e. that K(x,u) is bounded below uniformly with respect to 
(x, u) G D x U, it follows that there exists C > such that for all u G U 

\\P(; U) - P(; U')\\ V < C\\P(; u) \\ V \\K(; U ') - K(; u) || £ cc (23) . (19) 

Due to ( TTTj) . it follows from ( |T9l) that there exists a constant C > such that 

VueU: \\P(-,u) - P(;u')\\ v < C\\K(-,u') - K(;u)\\ L ~ {D) . (20) 
From (jl~3l) and Assumption [6] it follows with C > as in (120!) that 



|P(-,«)-P(.^')llv<CE 



/ 



>1 



< C\\U - n'||^oo (N ) ^ Wi'jllL^iD) 

< C— — K min \\u — u'We^^ . 

This establishes the desired Lipschitz continuity, which implies the asserted 
measurability. □ 

3.2. Bayesian Elliptic Inverse Problem 

We now define the Bayesian inverse problem. For Oi G V*, i = 1, . . . , k, which denote 
k continuous, linear "observation" functionals on V, we define a map G : U ->• R k as 

U 3 u i y g{u) := (d(P(-, «)), 2 (P(-, «)),..., O k (P(; u))) G M fc . 

By $ we denote an unbiased noise which follows a Gaussian distribution iV(0, E) in M k . 
We consider the observed data 5 for Q{u) subject to the noise i.e. 

5 = G{u) + d 

and define $ as in Q. We take as prior on u the measure p defined in the preceding 
subsection. The posterior measure on u given 5 can be explicitly written. 

Proposition 8 The conditional probability measure p s (du) = ¥(du\5) on U satisfies 

-j- oc exp(— $(w; 5)) . 
dp 

Furthermore, for 5,5' G M fc such that \5\, \5'\ < r there exists C = C(r) > such that 

dn c n(p S ,p S ')<C(r)\5-5'\ . 

Proof We have 

Vu,u' G U : \Q{u) - g{u')\ < cmax{||Oi||v.}||P(.,u) - P(;u')\\ v . 

i 

From fl20l) there exists a constant c > such that 

Vu,u'eU: \G{u)-g{u')\ <C\\K{;U)-K{;U')\\ L -{D) ■ 

Proceeding as in the proof of Proposition [TJ we deduce that G as map from U C £°°(N) 
to M fc is Lipschitz and, hence, p-measurable. We then apply Proposition [1] to deduce 
the existence of p 5 and the formula for its Radon-Nikodym derivative with respect to p. 
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For the well-posedness of p s , we verify Assumption [2J For the function Q(u) we 

have 

\g(u)\<max{\\O i \\ v .}\\P(-,u)\\ v . 
From (jTTj) sup{|^(w)| : u G U} < oo. We note that for given data S, there holds 

VueU: mu;5)\<^(\5\ s +\g(u)\x) 2 

and hence, since is uniformly bounded in U, the set U(r) in Assumption^) can 

be chosen as U for all r. We have, for every u G U, 

|*(u; S) - *(«; 5')\ < l\(X- 1/2 (6 + 5'- 2g(u)), ^ 2 (5 - 5'))\ 

<^l|S- 1/2 |lW ) (|5| + |5l + 2|^(u)|)|5-5'|. 
Choosing the function G(r,u) in Assumption EJ^ii) as 

G(r,u) = ±\\Z~ 1 / 2 \\ 2 mkMk) (2r + c), 

for a sufficiently large constant c > 0, we have shown that Assumption EJ^ii) holds. From 
Proposition [31 we get the conclusion. □ 

Remark 9 In the the preceding proof we have shown that \Q(u)\ is uniformly bounded 
foru in U . As a consequence there exists M(r) > which is a uniform bound on 5) 
for all \5\ < r and for all u G U . This bound is also uniform with respect to truncation 
of the infinite series ( T73j) for K , since this corresponds to a particular choice of some of 
the coefficients ofudU, and with respect to finite element approximation of the solution 
of (E2|), since the uniform upper bound on \G(u)\ will hold in finite element subspaces. 

4. Standard MCMC 

We study computational complexity of the MCMC method defined by ([7j) to sample the 
conditional probability measure p s determined in the previous section. The complexity 
results will be obtained here for the model scalar, elliptic inverse problem (1121) . We 
mention again that analogous results (with identical proofs) hold for inverse problems 
for general second order elliptic problems. While the complexity analysis of the MCMC 
algorithm is of independent interest, the results in the present section will be the 
foundation for several accelerations of the basic MCMC algorithm presented in Sections 
[5] and [6] ahead. In order to obtain a constructive version of the MCMC algorithm, we 
will approximate solutions of the forward problem (Tl2|) by applying the Finite Element 
Method in the physical domain D to its parametric version ( Tl6|) and by truncation of 
the polynomial expansion of the diffusion coefficient K given by ( 1131) . To obtain error 
bounds for the Finite Element discretization of the parametric forward problem (fl6l) in 
the domain D, we require differentiability of the coefficient functions ipj{x) with respect 
to the spatial variable x. 
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Assumption 10 The functions K and ipj in l[T3\) are in W 1 ' 00 ^) and 

^2 \\^j\\w^{D) < oo . 
i>i 

Furthermore, we assume that there exist positive constants C and q such that for all 
J G N, the sequence {ipj} in (IS) satisfies 

j>J 

Assumption [10] shall be imposed throughout what follows. From Assumption [10] and 
from (P3J), we deduce that K(-,u) G W 1 '°°(D) for all ueU. 

4-1. FE Approximation of the Forward Problem 

We describe an approximation of the forward problem based on finite element 
representation of the solution P of ( IT6|) . together with truncation of the series ( 1X5]) . 
We start by discussing the finite element approximation. Recalling that the domain D 
is a bounded Lipschitz polyhedron with plane sides, we denote by {T 1 }^ the nested 
sequence of simplices which is defined inductively as follows: first we divide D into a 
regular family T° of simplices; the set of regular simplices T l is determined by dividing 
each simplex in into 2 d subsimplices. Based on these triangulations, we define a 
nested multilevel family of spaces of continuous, piecewise linear functions on T l as 

V 1 = {ueV : u\ T e Vi(T) VT G T 1 }, 

where Vi(T) denotes the set of linear polynomials in the simplex T G T l . Approximating 
the solution of the parametric, deterministic problem (|T6l) from the finite element 
spaces V 1 introduces a discretization error which is well-known to be bounded by the 
approximation property of the V 1 : there exists a positive constant C > which is 
independent of I such that for all P G W = (H 2 fl Hq)(D) and for every < hi < 1 
holds 

inf \\P-Q\\v<Chi\\P\\HU {D) , (21) 

Q&V 1 

where hi = 0(2~ l ) = max{diam(T) : T G T 1 } is the mesh width of triangulation T l and 
where the constant C > depends only on the shape regularity of T l . 

To bound the cost of the Finite Element discretization, we assume in the following 
that the union of all Finite Element basis functions w l j of the spaces V 1 = span{u^ : j = 
1, Ni}, I = 0, 1, 2, constitutes a Riesz basis in V. We remark that such bases are 
available in two and three dimensional polyhedral domains (see, e.g., [20J) (the following 
assumption of availability of V^-stable Riesz bases is made for convenience, and may also 
be replaced by the availability of a linear complexity, optimal preconditioning, such as 
a BPX preconditioner). 

Assumption 11 (Riesz Basis Property in V) For each I G No there exists a set of 
indices I 1 C N d of cardinality Ni = 0{2 ld ) and a family of basis functions w l k G Hq(D) 
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indexed by a multi-index k G I 1 such that V 1 = spem{w l k : k G I 1 }, and there exist 
constants C\ and c<i which are independent of the discretization level I such that ifwEV 1 
is written as w = J2kei l c k w i e V l > then 

Cl J]|cL| 2 < \\W\\ 2 V <C 2 J2\c l k\ 2 - 

kel 1 k£l l 

Multiscale Finite Element bases entail, in general, larger supports than the standard, 
single scale basis functions which are commonly used in the Finite Element Method, 
which implies that the stiffness matrices in these bases have additional nonzero entries, 
as compared to 0(dimV ) = 0(2 dl ) many nonzero entries of the stiffness matrices that 
result when one-scale bases, such as the hat functions, are used. 
To bound the number of nonzero entries, we shall work under 

Assumption 12 (Support overlap) For all I G No and for every k G I 1 , for every 
V G No the support intersection supp(w l k ) D supp(w k ,) has positive measure for at most 
0(max(l, 2 l values of k' . 

We now discuss the effect of dimensional truncation, ie. of truncating the infinite series 
for the coefficient K of problem f|T6|) after J terms, as 

j 

K J (x, u) = K(x) + u i^j(x) ■ (22) 
j'=i 

To this end, we consider the approximating diffusion problem 

- V • (K J (x, u)VP J (x, u)) = f(x), P J = on 3D . (23) 
From (I19p . there exists a constant C > such that J G N and all u G U 

SU Vu€U ||P(-, U) - P J (; U)\\ v < C\\P{; u) \\ V \\K{; u) - K J (; u)\\ L ~ {D) 



< C\\P{.,u)\\ v J^<jg-J-*\ L 



(24) 



To bound the error incurred by Finite Element discretization, we require regularity of 
P(-,u). Assumption [TU1 implies the following regularity results. 

Proposition 13 If D is convex and f G L 2 (D), and if AssumptionlW\ holds, then, for 
every u G U, the solution P J (-,u) of (TJ2|) belongs to the space W := H 2 (D) n Hq(D) 
and there exists a positive constant C > such that 

sup sup \\P J u)\\ w <C\\f\\ L 2 {D) . 

JeN uau 

Proof By (1X41) . K J (x,u) > K mm > and we may rewrite the PDE in ( 1231) as 

-AP J (x, u) = ) A f{x) + VK J (x, u) ■ VP J (x, u)). 
K J {x, u) 

By our assumptions, the right hand side is uniformly bounded with respect to u G U in 
the space L 2 {D). As the domain D is convex, we deduce that P J are uniformly bounded 
with respect to J and u G U in the space W: it holds 

su P«ef/ ll ApJ (-> M )IU 2 (D) < ^-^supsup [||/||l2(£>) + \\K J (-,u)\\ w i, x{D) \\P J (-,u)\\ v ] 

-ft-min u£U J>1 
< C < OO . 
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The desired, uniform (w.r. to J and I) bound in the W norm then follows from the 
Vy 1 '°°(Z})-summability of the ipj implied by Assumption [TOj the L? bound on Am and 

(DH>. □ 

Finally, we consider the finite element approximation of the truncated problem ( 123]) : 
given J, I £ N, find P J ' l (-, u) £ V 1 such that 

/ K J (x,u)VP J ' l (x,u) ■ V<p(x)dx = [ f(x)(f>(x)dx, V<f)eV l . (25) 

J D J D 

By the uniform positivity ( |T4|) of K J (x,u), the following error estimate holds: 

\\P J (;U)-P J ' 1 (;U)\\ V < C2- 1 \\P J (;U)\\ W . (26) 



Therefore combining (j24j) and f[26|) and repeating the argument in the proof of ([T 
we obtain: 



Proposition 14 Consider the approximation of the elliptic problem (Tty) via the gpc 
finite element solution of the truncated problem [2^1 , under Assumptions® [70 andUli 
Then there exists a constant C > such that for every J, I £ N and for every u £ U it 
holds that 

SU P \\P(;U)-P J > 1 (;U)\\ V < C (2~ l \\P J (■ , U )\\ W + J^\\P(- , u)\\ V ) . (27) 

ueu 

Moreover, the Finite Element solutions P J,l (-,u) are V -stable in the sense that 

supsup||P^(-,«)||^<-^||/||^. (28) 



K, 



nun 



4-2. FE Approximation of the Posterior Measure 

We denote the vector of observables from the discretized parametric system's forward 
solution map by 

G J ' l (u) = (O^P^iu)), . . . , O k (P J ' l (u))) :U^R k 

and define the function 

&\u ] 5)= l -\5-g J \u)\l. (29) 

We define an approximate conditional posterior probability measure p J ' l ' s on the 
measurable space (17, O) as 

^— ocexp(-$ J ' / ( M ;5)) . 
dp 

The measure p J,l,s is an approximation of p s , with error in the Hellinger metric which 
scales with J and / as the forward error in Proposition [HI We show this in the following 
Proposition whose proof generalizes the method introduced in [8] . 

Proposition 15 If the domain D is convex and if f £ L 2 (D), there exists a positive 
constant c depending only on the data 5 such that 

4ieii(pV' M ) <c(5)(J-« + 2~ l )\\f\\ LHD) . 
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Proof We denote the normalizing constants as 

Z{5)= [ exp(-$(u;£))dp(u), Z J > 1 {6) = j exp(-$ J ''(u; S))dp{u) . 



We then estimate 



= / (z(5)-^ eX p(-I$( U ;5))-(Z^))-V2exp(-i^( W ;5)))V«) 

where we defined 

h := ^^( exp (~ 2^'^) -exp (- ^$ J ''(M;5))) 2 rfp( M ), 

/ 2 :=2|Z( ( 5)" 1/2 -Z J ''(«5)" 1 / 2 | 2 f ex P (-<5> J > l (u;5))dp(u) . 

if/ 

We estimate I\ and J2. To bound Ii, given data 5, for every u & U holds 

exp ( - 5)) - exp ( - V'<( W ; 5)) | < ~|d>( U ; 5) - $ J ^; 5)| 

< c(2|5| + \Q{u)\ + |0 J ''(«)|)|0(u) - G J \u)\ . (30) 

Moreover, by Proposition (fl4j) . there exists a constant C > independent of J and of / 
such that, for all u E U , there holds 

|S( W )-<^( W )| <Cmax{||O i || y ,}||P(-, M )-P J ''(-, M )|| y 

<C(2^||P J (-,u)||^ + J^||P(-,u)lk) • 

By (iT7|) and Proposition [T3| ||P(-,u)||y and ||P J (-, w)||w are uniformly bounded with 
respect to u G U, so that 

/l < C(5)E^2- 1 \\P J (;U)\\ W + J^\\P(;U)\\ V ) 2 

< c(<5)(J^||/H 2 >,+2^||/||i 2(D) ). 

To estimate term J 2 , we observe that there is a positive constant c > such that for 
every J, / G N holds 

iZ^r 1 / 2 - Z" 7 ''^)" 1 / 2 ! 2 < c(Z(5y 3 V Z J ' ? ((5)- 3 )|Z(5) - Z J ' l (5)\ 2 . 



We note that 



\Z{8) -Z J > 1 (5)\ < J | exp(-$(u; 5)) - exp(-$ J ''(u; 

< / |$(M;5)-$ J ''(M;5)|rfp(M) . 
it/ 

Therefore, as Z(<5) and Z J ' l (5) are uniformly bounded below for all 5, analysis similar 
to that for I\ shows that 

I 2 <c(2- 2l + J-^)\\f\\l 2{D) . 

Thus 

dn c n(p S ,p J ' 1 ' 5 ) < c(5)(2- 1 + J~«) \\f\\ LHD) . 

□ 
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4-3. Computational Complexity of Standard MCMC 

Given J, I G N, and data 5, we use the MCMC method (GO) to sample the probability 
measure p- 7 ^ 5 . In so doing we create a method for approximating integrals of functions 
g : U — > K with respect to p s . We use the following notation for the empirical measure 
generated by the Markov chain designed to sample p J < l < s : 

M 
k=l 

where the Markov chain (u^)k^N is generated from the process (J7J) with the acceptance 
probability being replaced by 

a J ' l (u,v) = 1 Aexp($ J ' / (n; ( 5) - § J ' l (v;5)) , (it, v) E U x U . (31) 

Given M e N we wish to estimate the MC sampling error 

W S [g]-E^ 5 [g]. (32) 

We develop in the following two types of error bounds as M — > oo for (1321) : a probabilistic 

j,i 
,(0) 



error bound for V J f 0) almost every realization of the Markov chain and a mean square 



bound. 

Proposition 16 Let g : U — >■ R be a bounded continuous function on U with respect to 
the supremum norm. Then, for every initial condition u^> and for V'^ l 0) -almost every 
realization of the Markov chain holds the error bound 

EP S g(u) - E p A "' S i9}\ < ciM-5 + C2 (J-* + 2" 1 ) 

where c\ < cs\C,m\, £m is a random variable (on the probability space generating the 
randomness within the Markov chain) which converges weakly as M — >• oo to £ ~ N(0, 1) 
and C2 is a non-random constant independent of M, J and I. 

Moreover, there exists a constant C4 (which is deterministic and depends only on 
the data 5, and which is, in particular, independent of J, I ) such that 

f g P,J,i [ | E P a _ E £ l ' s J 2 j \ 1/2 < C4 (M" 1/2 + r q + 2~ l ) (33) 

Here, the constant q > is as in AssumptionUU 

Proof As g is bounded, we have from Proposition [15] and properties of the Hellinger 
metric (specifically, from (2.7) in [8]) for every u G U that 

\W S g{u) - E^' S g(u)\ < c(g)d Uc n( P 5 , P J > 1 ' 5 ) < c(^)c(5)(J^ + 2"') . (34) 

Here, c(5) is as in Proposition [15] and c(g) depends on the supremum of g{u) over U, but 
is independent of J, /. By Theorem 0] (and Remarks [5] and [H]) we deduce the existence 
of a constant C > 0, independent of M, J and I, such that 

M 

\^ l ' S 9-^Y,9^ {k) )\<C\UM- 1 ' 2 
k=i 
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where £m converges weakly as M — > oo to £ ~ N(0, 1). Combining this with ( I34p gives 
the first assertion. 

To prove the mean square error bound ( |33l) . we define 

g{u) :=g{u)-W J > l '\g]. 

Due to the invariance of the stationary measures p- 7 ^ 5 , we may write 



^- cp J ' l ' s ,J,l 

M 



M 

[|E 



k=l 



1 M M 



M 



k=l j=k+l 
M-l M-k 



fc=o i=i 

. JMM-t 

= E^(«< >) 2 ] + 2^ £ £ ^■ , ^(«(°))^{5 ) ^(«W)]] 

k=0 j=l 

<W J ' l ' 6 [g{u^f] 

1 M-l M-fc 

+ 2 ^ E su p ia E ^ni^iU^) - 

fc=o i=i 

Af-1 Af-jfc 

< W J ' l ' S [g(u^n+A± £ sup |s| 2 £(1 - 

fc=o i=i 

where, as in Remark [5J due to sup ug(7 ||P J,/ (M)||y being bounded uniformly with respect 
to the (discretization) parameters J and I, the constant < R < 1 is independent of 
the parameters J and I. Since sup J/ E pJ ''' [gfV **) 2 ] is bounded independently of J and 
of Z, we deduce that 



sup 



As 



j£ £p J ' l > 5 ,J,l 



M 



1 

LI M 



M 



it 



< oo 



fe=l 



; i J U 



k=l 



k=l 
M 



U 



<i E^ 



n(„( k y 



k=l 
M 



dp{u 



dp 
dpj,i,s 



(0)^ 



(u< >)d/'V 0) ) 



<£ pJ ' l ' s ' J ' l \\J2~9 



u 



k=l 



Z J '*(5)supexp($ J ''(u;<5)) . 



As Z J \b) < 1 and as sup^gj^ is bounded uniformly with respect to J and 

I, we get the conclusion after using the bound from Proposition [T5] on the Hellinger 
distance between p 5 and p J ' /,<5 . □ 
We consider the case where g(u) = £(P(u)) with I being a bounded linear functional 
in V*. As 

\E pS [£(P(u))] - E pS [£(P J > l {u))} \ < c(r q + 2" 1 ) 
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and 

\E pS [£(P J ' l (u))} - E pJ ''' S [£(P J ' l (u))} \ < c(J~ q + 2~ l ), 

we have 

\W S [£(P(u))] - E pJ, '' S [£(P J ' l (u))]\ < c{J- q + 2- 1 ). 

We therefore perform an MCMC algorithm to approximate E pJ ''' s [£{P J,l {u))}. As 
£{P J,l {u)) and § J ' l (u;5) depend only on the finite set of coordinates {u\,...,Uj} in 
expansion (1T51) . we perform the Metropolis-Hastings MCMC method on this set with 
proposals being drawn from the restriction of the prior measure p to this finite set. 

Proposition 17 Let g[u) = £{P{u)) where £ is a bounded linear functional in V* . The 
approximate evaluation of the sample average j? 

Y.k=i^ pJ, \ u{k) )) h V the Markov Chain 
Monte Carlo Finite Element Method (MCMC-FEM for short) with M realizations of the 
chain, with Finite Element discretization in the domain D at mesh level I as described 
above, and with J -term truncated coefficient representation fifty) , requires 0{l d ~ 1 2 dl M J) 
floating point operations. 

Proof From Assumption [121 we deduce that the total number of non-zero entries of the 
stiffness matrix for solving the Finite Element equation ff25l) is 0(l d ~ 1 2 dl ). To compute 
each of these entries, we require O(J) operations for computing the coefficients K J at the 
quadrature points. Therefore the cost of constructing the stiffness matrix is 0(l d ~ 1 2 dl J). 
From Assumption [HI the Riesz basis property of the Finite Element basis implies that 
the condition number of the stiffness matrix is bounded uniformly for all I. Hence, the 
conjugate gradient method for the approximate solution of the linear system resulting 
from the Finite Element discretization with an accuracy comparable to the order of the 
discretization error requires 0{l d ~ l 2 dl ) float point operations. The total cost for solving 
the approximated forward problem at each step of the Markov chain requires at most 
0{l d ~ l 2 dl J) floating point operations. Computing £(P J ' l (u^)) requires 0(2 dl ) floating 
point operations. Since we draw M samples of the chain, the assertion follows. □ 
We have the following result. 

Theorem 18 Let Assumptions El E3 and [TD hold. For g{u) = £(P(u)) where £ is 
a bounded linear functional in V* , with probability PN doi {t) the conditional expectation 
E pS g{u) can be approximated using iVdof degrees of freedom to approximate the forward 
PDE and t 2 N^ MCMC steps (with a total oft 2 N^ { 2/d degrees of freedom), incurring 
an error of 0(N^/ d ), and using not more than 

a 2 h g (N do{ y~iNi:f +1 ^ d 

floating point operations, where 

nc't 1 

l im PN do M^ - 7 =exp(-x 2 /2)dx , 
jVdof->°° J -dt v2tt 



for some positive constants c,c' independent of A^j f- 
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In mean square with respect to the measure 

<pp,j,i } W [g{u)] can be approximated 
with an error 0(N^J d ), using not more than N^ { 2 ' d number of degrees of freedom in 
total, and not more than 0(\og(Na {) d ~ 1 N^ 2+1 ^ q ^ d floating point operations. Here, the 
constant q > is as in AssumptionlTR 

Proof We invoke the error estimate in Proposition [TBI an d choose the parameters M, 
J and I so as to balance the bounds M _1//2 , J~ q and 2~ l , taking into account the fact 
that the coefficient of is only known through its asymptotic normality. We select 
J = 2 l / q and M = t 2 N^ d where t = c 3 |£m|, with N do f denoting the number of degrees 
of freedom at each step being N^of = 0(2 dl ); the constant C3 and the random variable 
£m is as in Proposition [T6j Then the total number of floating point operations required 
as I ->■ 00 is not larger than 0(^d-i2(d+2+i/<z)Z) 

We then arrive at the conclusion. □ 



5. Sparse gpc-MCMC 

We again study computational complexity of the MCMC method defined by (JTJ) to 
sample the conditional probability measure p s determined in the previous section. 
However we study a computational method which effects a reduction in computational 
cost by precomputing the parametric dependence of the forward model, which enters 
the likelihood. This method is introduced, and used in practice, in the series of 
papers [T7J [151 US]- The major cost in MCMC methods is the repeated solution of 
the forward equations, with varying coefficients from the MCMC sampler of p s . The 
complexity of these repeated forward solves can be drastically reduced by precomputing 
an approximate, deterministic parametric representation of the system's response which 
is valid for all possible realizations of u £ U. Specifically, we precompute a sparse tensor 
finite element approximation of the parametric, deterministic forward problem, by an 
approximate polynomial chaos representation of the solution's dependence on u and by 
discretization of the forward solutions' spatial dependence from a multilevel hierarchy 
of Finite Element spaces in D. As we shall show, this strategy is particularly effective, 
if only linear functionals £(■) of the system's response are of interest: in this case, only 
scalar coefficients of the gpc expansion need to be stored and evaluated. We use this to 
reduce the cost per step of the MCMC method. We again work under Assumption [10] 
and, furthermore, we make the following assumptions throughout this section: 



5.1. Sparse Tensor gpc-Finite Element Approximation of the Parametric Forward 
Problem 

5.1.1. Best N term parametric approximation By ([IT]) , the solution P(-,u) of problem 
(fl6]l is uniformly bounded in V by ||/||v*/i^min- Therefore, from Proposition [7J 
we deduce that P(-, •) £ L 2 (U, p;V). Therefore, the parametric solution admits a 
polynomial chaos type representation in L 2 (U,p; V). To define it, we denote by L n {u n ) 
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the Legendre polynomial of degree n, normalized such that 

\L n (0\ 2 d£ = l. 



1 * 



By J 7 we denote the set of all sequences v — (ui, v 2l ■ ■ .) of nonnegative integers such 
that Uj are integers for all j and only a finite numbers of them are non-zero. We define 

L„(u) = Y[L„.( Uj ). (35) 

Since Lq = 1, for each v G J 7 the products contain only finitely many nontrivial factors. 
The set {L u : v G J 7 } forms an orthonormal basis for L 2 (U, p). We can therefore expand 
P(-,u) into the Legendre expansion 

where P v := fjj P(-, u)L v {u) dp(u) G V. By the L 2 (U,p) orthonormality of the set 
{L v : v G J 7 }, Parseval's equation in the Bochner space L 2 (U,p; V) takes the form 



VPeL 2 (U,p;V): \\P\\h { u*v) = Y,W P > 



2 

y\\V 



For the ensuing analysis, we shall impose the following assumption on the summability 
of the gpc expansion of P: 

Assumption 19 There exists a constant < p < 1 such that the coefficients P v of the 
gpc expansion of P satisfy (\\P u \\v)u £ ^(J 7 )- 

This assumption is valid under the provision of suitable decay of the coefficient functions 
ipj such as Assumption [101 We refer to (5J [6] for details. By a classical argument 
("Stechkin's Lemma"), this implies the following, so-called "best iV-term approximation 
property" . 

Proposition 20 Under Assumption UR there exists a nondecreasing sequence 
{A N } N£N C T of subsets A N whose cardinality does not exceed N, such that 

2 



p-J2 p ^ u ) L - 



= \\Pv\\v<CN- 2r , (36) 

L 2 (U,p;V) ^eT\A N 

where the convergence rate r = 1/p — 1/2 > 1/2 and where the constant C = 
\\(\\Pu\\v)ueJ r \\ep(^) is independent of N. 



5.1.2. Finite element approximation The best iV-term approximations 

uk n := ^2 P v( u ) L v ( 37 ) 

u<=A N 

in Proposition [20] indicate that sampling the parametric forward map with evaluation 
of iV solutions P u {u), v G An of the parametric, elliptic problem with accuracy N~ r is 
possible; since r > 1/2, this is superior to what can be expected from iV MC samples. 
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There are, however, two obstacles which obstruct the practicality of this idea: first, the 
proof of Proposition [20] is nonconstructive, and does not provide concrete choices for 
the sets A a? of "active" gpc coefficients which realize ( |36|) and, second, even if A at were 
available, the "coefficients" P u G V can not be obtained exactly, in general, but must 
be approximated for example by a Finite Element discretization in D. 

As u G L 2 (U,p; V), we consider the variational form "in the mean" of ffl6|) as 

/ / K(x, n)VF(x, u) ■ VQ(x, u)dxdp{u) = I I f(x)Q(x,u)dxdp(u), (38) 
Ju Jd Ju Jd 

for all Q G L 2 (U, p;V). For each set An C J 7 of cardinality not more than A" that 
satisfies Proposition [201 an d each vector £ = (l u )ueA N of nonnegative integers, we define 
finite dimensional approximation spaces as 

X N ,c = {Pnx = PuA^)Lu(u); P u>c G V 1 "} . (39) 
ueA N 

Evidently, X^,c C L 2 (U,p; V) is a finite-dimensional (hence closed) subspace for any A" 
and any selection £ of the discretization levels. 

The total number of degrees of freedom, A^o/ = dim.(X^x), necessary for the sparse 
representation of the parametric forward map is given by 



o ( 2dlv ) 



A^dof = O > 2 dlv as iV.l^oo. (40) 



The stochastic, sparse tensor Galerkin approximation of the parametric forward problem 
( |T6l) . based on the index sets A^ C J 7 , and £ = {l u : v G A^}, reads: find Pn,c £ ^iv,£ 
such that for all Qn,c £ X NjC holds 



KPn,c,Qn,c) ■= / / K(x,u)VP N ,c-VQ NiC dxdp(u) 

J V J j? (41) 

= / / f(x)Q N<c (x,u)dxdp(u). 
Ju Jd 

The coercivity of the bilinear form b(-, •) ensures the existence and uniqueness of Pn,c 
as well as their quasioptimality in L 2 (U,p; V): by Cea's lemma, for a constant C > 
which is independent of A and of £, 

\\P — Pn,c\\l 2 (u, P ;V) < C m f II -P — / , Qu,cL u \\l 2 (U. P ;V) ■ 

We obtain the following error bound which consists of the error in the best A^-term 
truncation for the gpc expansion and of the Finite Element approximation error for the 
"active" gpc coefficients. 

\\P ~ PN,c\\h(u, P ;V) < C(N~ 2r + V inf \\P V - Q W , C \\ 2 V ) . (42) 

The following assumption is, therefore, a stronger requirement than the mere p- 
summability of the gpc coefficient sequence {HP^Hyj^ejF. 
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Assumption 21 There are positive constants t, a and (3 such that with a total budget 
of iVdof degrees of freedom, and with at most N = N^l active gpc modes v, an active 
set of gpc modes An such that \u\ = O (log AT) V v G A^, and a combined gpc-Finite 
Element approximation Pn,c £ ^n,c with rate of convergence 

\\P ~ Pn,c\\l2(u, P ;V) < CN^f, 
can be found in 0(A^ ^(log Ndof) ) float point operations. 

Note that the constant C > in Assumption [2T] could, in general be considerably larger 
that the best possible constant in the Af-term approximation result Proposition [201 

Let us indicate sufficient conditions that ensure Assumptions [191 EH The first 
condition is quantitative decay rate of the coefficient functions ipj in the parametric 
representation ( fi"3l) of the random input. 

Assumption 22 The coefficients ipj are arranged in decreasing order of magnitude of 
H^jlU 00 !/)) and there is a constant s > 1 and C > such that 

VjeN: Wjh-w < Cj~ s . 

To obtain convergence rates for the FE-discretization in the domain D, i.e. of the 
term \\P V — Qv,c\\v m (H21), it is also necessary to ensure spatial regularity of the solution 
P(x,u) of the parametric problem f[T6|) . To this end, we require 

Assumption 23 For all j G N, ipj G W 1 ' 00 ^) and there exists a constant C > such 
that 

Vj G N : || y^j\\w 1 ' oo {D) < Cj~ s ' for some 1 < s' < s . 

We remark that Assumptions [22] and [23] imply Assumption [10] with q — s — 1 > 0. 
Under these assumptions, the following proposition holds. 

Proposition 24 Under Assumptions ^!^ \23j if, moreover, the domain D is convex and 
f G L 2 {D), the solution P(-,u) of the parametric, deterministic problem belongs 
to the space L 2 (U,p; W). 

From estimate f|4*2|) . we get with Proposition [2H and standard approximation properties 
of continuous, piecewise linear FEM the error bound 

\\P ~ Pn,c\\Ihu, P ;V) < C(N~ 2r + 2~ 2L \\PAm(D)) ■ (43) 

ueA N 

In order to obtain an error bound in terms of N^f defined in ([401 which is uniform in 
terms of N, we select, for v G A^r the discretization levels l v of the active gpc coefficient 
P v so that both terms in the upper bound (|4"3"]) are of equal order of magnitude. This 
constrained optimization problem was solved, for example, in [5], under the assumption 
that (\\P u \\ H 2 {D) ) u e£ p (T). 

In recent years, several algorithms have appeared or are under current development 
which satisfy Assumption [21] with various exponents a > 1 and > 0. We mention 
only the references O [211 HH 121 S] 
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5. 2. Approximation of the Posterior Measure 
For the solution Pn,c in Assumption [211 we define 

g N > c (u) = (Oi(p n M), . . . , o d (p N , c (u))), 

and the function 

^ c {u-5) = \\5-G^{u)\l. 

The conditional measure p N > c > 5 on the measurable space (U, 6) is defined as 

^— oc expC-^M)) • 
dp 

We then have the following approximation result. 

Proposition 25 Let Assumptions lTR l21\ hold. Then there is a constant c = c(8) which 
only depends on the data 8 such that 

dn c n(p S ,p N ' C ' 5 )<c(6)N^. 

Proof The proof for this proposition is similar to that for Proposition [T5l differing only 
in a few details; hence we highlight only the differences. These are due to estimates on 
the forward error from Assumptions [211 being valid only in the mean square sense whilst 
Proposition [TH holds pointwise for u G U. Nonetheless, at the point in the estimation of 
Ji and I2 where the forward error estimate is used, it is possible to use a mean square 
forward error estimate instead of a pointwise forward error estimate. From Assumption 
I2T1 we deduce that there is a positive constant c such that: 

p{u: \g(u)-Q N ' c (u)\>l}<cN- o f . 

As ||P(u)||v is uniformly bounded for all u, there is a constant Ci(S) > such that 
\S — G(u)\-e < ci(S). Choose a constant 02(6) > sufficiently large. If \5 — Q n,c {u)\y, > 
02(8), then 

\Q N ' c (u) - G(u)\v >\8- ^ £ (n)| s -\8- Q{u)\v > c 2 (8) - Cl (6) > 1 . 

Let Ui C U be the set of u e U such that \8 - Q N ' c {u)\x > c 2 {8). We have that 
p(£/i) < cN~ f. Thus, 

^ jT I exp ( - 8)) - exp ( - ^ N ' c (u; 8)) \dp{u) < c{8)N^ . 

Whenw £ U u \8~g N ' c {u)\^ < c 2 (8) so there is a constant c 3 (8) so that \G N ' c (u)\ < c 3 (8). 
An argument similar to that for ( )30|) shows that 

U, .A ( 1 



exp ( -^(u; 8) J - exp i-^ N ' c (u; 8) 

c (2\8\ + \g( u )\ + \g N ' c (u)\)\g{u) - g N > c ( u 



< 
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Therefore 

h = W)lu I ^ ( ~ " eXp ( " \^ N ' C ^ 5 

c / (2|5| + + c 3 (6)) 2 \g(u) - Q N > c {u)\ 2 dp{u) 

Ju 

< c(6)Ntf + c(S) J ||P(-, u) - P^(., «) \\ 2 v dp(u) 
To show that J 2 < c(5)A r ( ^ ) f T we still need to verify that 



25 



?N,C 



cxp( 



is uniformly bounded from below by a positive bound for all N and £. As Pn,c is 
uniformly bounded in L 2 (U,p; V), 



>N,Ct 



u)\dp{u) <c \\P Ni c(u)\\ v dp(u) < c. 



u 



NX i 



u ) > r 



Fixing r > sufficiently large, the p measure of the set u £ [/ such that \Q 
is bounded by c/r. Therefore the measure of the set of u £ U such that \Q N,c {u)\ < r 
is bounded from below by 1 — c/r. Thus we have proved 



Z^(5) > / exp(--(|5| s + |^ v ^(n)| E )^)dp(u) > c(5) > . 
Ju 1 

Let (u^)k be the Markov chain generated by the sampling process 
acceptance probability being replaced by 

a N ' c (u,v) = 1 Aexp($ 7V ' £ (n,5) - <$> N ' c (v,5)) . 

We denote by 



□ 

with the 

(44) 



E p 

AI 



1 M 

[9] = M^ 



9{u 



fc=i 



We then have: 

Proposition 26 Let g be a bounded continuous function from U to JR.. T/ien 

IE'* [g] - EC"" [g] I < c 6 M-^ + c 7 N-J { , (45) 

-pp N ' c ' s ,N,c a \ mos i surely, where Cq < c 8 \£m\ where £m is a random variable which 
converges weakly as M — > oo to £ ~ -/V(0, 1); the constants cj and c§ are deterministic 
and do not depend on M , N and N^ Q {. 

There exists a deterministic positive constant eg such that the gpc-MCMC converges 
in the mean square with the same rate of convergence 



£P,N,C 



N,C,S . 



*r\g\-E? M -\g\ 



1/2 
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Proof Using (144 p . the probability that a random draw from p has probability larger 
than exp(— Q N,c (v ; 5)) of being accepted. Therefore the transition kernel of the Markov 
chain generated by (CO) with the acceptance probability (H4l) satisfies 



p(u,A)> I exp(-$ N > c (v;5))dp(v) 



Using Theorem 16.2.4 of [18], we deduce that the nth iteration of the transition kernel 
satisfies 



/^'Utv < 2 I 1 



exp(-® N,C (v,5))dp(v) 



v 



lb>, 

From the proof of Proposition [25l we have 

exp(-$ N > c (v;5))dp(v) >exp(-c 2 (6) 2 /2) + cN^f . 

Thus, we can choose a constant R < 1 independent of the approximating parameters N 
and £, so that for all n G N holds 

||p>, •)- P^llxv < 2(1 - RY ■ 

In a similar manner as for Proposition [T6J, we deduce the probabilistic bound. For the 
mean square bound, similar to the proof of Proposition [THl we have 



£'- 



~,N,C 



^ [g] - e£ c ' s [g] ' < C{M-y* + NX f . 



dof) 



Let U' := {u G U : \Q ' (u) —Q{u)\ > 1}. We deduce that there exists a constant c > 



independent of £, Ndof, N such that p(U') < cN d ^7 and such that we may estimate 



(") 



gp,N,C 
£N,C 

'U' 

+ 
2r 



W [g]-El/ ' M 

W S [g]-EC C "[g] 2 ]dp{u 



(0)^ 



C7\C/' 



(0) 



W[g}-E p -'"[g] dp(uW) 



S 



U\U' 



;(0) 



NX.S 



,(o)^ 



5 



-(0) 



W \9] ~ K [9] ^ £ (5)exp($^(n;5))rfp^^(n(°)) . 

>U\U' 

On [/ \ U', sup ug(7 ^^^(m)! is uniformly bounded with respect to all N and C From 
this, we get the conclusion. □ 

Remark 27 In Proposition \2h\ g is assumed to be a bounded continuous function from 
U to R. The sparse-MCMC is of particular interest in the case where g is given by £oP 
where £ is a linear functional on V , i.e. £ G V*. From Assumptionl21\ and the fact that 



dp 5 
dp 



u 



Z{5) 



exp(-$(u; 5)) 



we deduce that 



\W [£{P{u))]-W [£(P N ,c{u))]\ < c(5,£)N- T f . 
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On the other hand, from Proposition flffi we have (cf. Eq. (2.7)]) 

\W s [£{P N , c {u))\ -W N ' c ' s [t{P NtC {u))]\ < c(5,£)N^ f . 
Therefore, by the triangle inequality, 

\E^[i(P( u ))]-W N ^[i(P N , c (u))]\ < c{5,£)NZ f . 

We wish to approximate K pN ' ' [£(Pn,c{ u ))] with a Markov Chain-Monte Carlo 
algorithm. In doing so, the following difficulty may arise: although £(P(u)) is uniformly 
bounded with respect to u G U , sup nG(7 £(Pn ! c( u )) ma V n °t be uniformly bounded with 
respect to N and C. However, we can still apply Proposition ESI by using a cut-off 
argument: to this end, we define the continuous bounded function g(u) : U — >■ R by 
truncation, i.e. 

( KPnA 1 ")) */ \KPnM)\ < ™P u€U \Z(P(u))\ + 1 » 

~ g (u) : = J s ^p 1^(^)1 + 1 if £(Pn,M) > sup uec/ \£(P(u))\ + 1 . 

-sup \£{P{u))\ - 1 if £(P N ,c(u)) < -su Pu6C/ \£(P(u))\ - 1 . 

^ u€U 

Define U' := {u G U : \£(P(u)) - £{P N) c{u))\ > 1}- From Assumption^ we find that 
p(U') < c(£, 5)N^ . It follows then that there exists a constant c > depending on the 
data 6, but independent of N and of C such that 

\E^ e [£(P N , £ (u))-~g(u)}\< [ \£{P N , c {u))-g{u)\dp N ^\y) 



<c{5) / I ut (u)(\£(P Nt£ (v))\ + c)dp(y) 
Ju 

< c (6)p(u') 1/2 (U(PN,M\\LHu, P m + c)< c(8)N-; f . 

Therefore, we may run the MCMC algorithm on E p 

At each step of the MCMC algorithm, we need to compute £(Pn,c( u ^)) which, 
for linear functionals £(■), is equal to Ez/eA^^^^l^ )■ Because the parametric 
solution of the elliptic problem can be precomputed before the MCMC is run, and then 
needs only to be evaluated at each state of the MCMC method, significant savings can 
be obtained. We illustrate this, using the ideas of the previous Remark ETJ to guide the 
choice of test functions. 

Proposition 28 Let g(u) = £(P(u)) where £ is a bounded linear functional in V* . 
Under Assumption [21\ the total number of floating point operations required for 
performing M steps in the Metropolis-Hastings method as N,M —> oo is bounded by 
0(N« of (log N do{ f + MN log N) . 

Proof Under Assumption |2T| the cost of solving one instance of problem (141)) is bounded 
by 0(A^ of (log N&otY). At each MCMC step, we need to evaluate the observation 
functionals 

O l {P N , c {u {k) )) = 0*{P»,c)LM k) ) ■ (46) 



N 
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We note in passing that the storage of the parametric gpc-type representation of the 
forward map (H6l) requires only one real per gpc mode, provided that only junctionals of 
the forward solution are of interest. We now estimate the complexity of computing one 
draw of the forward map fj46|) . For v G J 7 , each multivariate Legendre polynomial 
L v (u k ) can be evaluated with float point operations. As \v\ = O(logiV), 

computing the observation functionals O^P^^c) requires 0(N\ogN) floating point 
operations. Thus we need 0(N doi (\og N doi Y + MN log N) floating point operations 
to perform M steps of the Metropolis-Hastings method with sampling of the surrogate, 
sparse gpc-Finite Element representation of the forward map. 

□ 

Theorem 29 For g(u) = £(P(u)) where £ is a bounded linear functional £ G V* , under 
Assumptions W[ [771 and \2% with probability PN doi (t) the conditional expectation K pS [g(u)} 
can be approximated with N do { degrees of freedom, incurring an error of 0(N d J f ) using 
not more than 

cN« oi (log N doi y + ct 2 N 2 d T /r log(iV dof ) 

many floating point operations, where 

nc't 1 

lim P7v dof (£) -> / —, ==exp(-x 2 /2)cb, 

7V dof ^oo J_ c , t ^2lX 

for some constants c,c' independent of N do f. 

In the mean square with respect to the measure can be approximated 

with Ndof degrees of freedom, with an error N^Jj- using not more than 

0(N« oi (hgN do{ r + N 2 d T /r log(iV d of)) 
floating point operations. 

Proof We relate the number of MCMC realizations M with the total number of degrees 
of freedom N^f by equating the terms in the error bound f !45p . To this end, we choose 
M = t 2 N do ~ f where t = cs\£m\] the constant eg and the random variable £&f are as in 
Proposition [26] . With N = iV^ f r , the number of floating point operations required in 
Proposition [2S] is bounded by 

ciYXf (log N do{ f + ct 2 N 2 d i; r/r \ogN doi . 

As £m converges weakly to the normal Gaussian variable, we deduce the limit for the 
probability density PN dot {t) of the random variable t. The proof for the mean square 
approximation is similar. □ 

Remark 30 In the previous section, the parametric PDE j[Tb}) is to be solved once at 
every step of the MCMC process, using N do { degrees of freedom, with 0{N do 2 ^ d ) steps 
required (the multiplying constant depends on a random variable when we consider the 
realization-wise error). Ignoring log factors, the resulting error can be expressed in 
terms of the total number of floating points operations Nf p as 0(N7^ d+2+1 ^ q ^). Here, 
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the forward PDE is solved for every realization before running the MCMC process. The 
rate of convergence of the MCMC process in terms of the total number of floating point 
operations used is 0(N ^ mm ( T / Q > 1 /( 2 +i/r))^ can ^ significantly smaller than the rate 
of convergence in Theorem\T^ when a is close to 1. For example, with the decay rate 
of 1 1 V^' I |oo i n Assumption 12E, the summability constant p in Assumption [7PI can be any 
constant that is greater than 1/s. Therefore the constant r in Proposition Uffl can be any 
psitive constant smaller than s — 1/2. On the other hand, the constant q in Assumption 
ITU is bounded by s — 1. As 

1 , 1 
2 + <d + 2 + , 

s - 1/2 s - 1 ' 

we therefore can choose r so that 
1 1 
2 + 1/r > d + 2 + 1/q ' 
As shown in JE/, when (\\Pu\\h 2 (d))u G l v [J : \ r can be chosen as 1/d. Thus, when a is 
sufficiently close to 1, the complexity of the sparse gpc-MCMC approach is superior to 
that of the plain MCMC approach in the previous section. 

6. Multilevel MCMC 

We showed that substantial complexity reduction is possible in the "plain" MCMC FE 
sampling of the posterior measure p s introduced in Section H] provided that all samples 
are computed from one precomputed sparse tensor gpc-representation of the forward 
map of the parametric, deterministic problem ( JT6l) . We proved, in particular, that the 
forward map Q{u) is obtained from continuous, linear functionals Oi(-) on the forward 
solution U 3 u y P(-, u) £ V allowing for a sparse approximate representation of gpc- 
type. Lower efficiency results if, for example, the rate of convergence of the procedure 
for computing the solution of the sparse tensor finite element solution in (T4TI) is slow 
with respect to the total number of degrees of freedom, and/or if the complexity grows 
superlinearly with respect to the number of degrees of freedom. 

Although an increasing number of algorithms for the efficient computation of 
approximate responses of the forward problem on the entire parameter space U are 
available (e.g. O I2TJ [11] [2j H]) and therefore the gpc-MCMC is feasible, many systems 
of engineering interest do not readily admit gpc-based representations of the parametric 
forwards maps. Finding other non-gpc based methods for reducing complexity of 'direct' 
MCMC sampling under p 5 is therefore of interest. In this section, we give sufficient 
conditions on the data and on the ipj such that complexity reduction is possible by 
performing a multilevel sampling procedure where a number of samples depending on 
the discretization parameters are used for problem (Tl2l . 

6.1. Derivation of the MLMCMC 

Consider I £ V*, i.e. a bounded linear functional on V. We aim at estimating 
EP s [£(P(u))] where P is the solution of problem (EJ). For each level / = \L/2], \L/2] + 
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1,...,L, we assume that problem (1121) is discretized with the truncation of the 
Karhiinen-Loeve expansion after J terms with J = Ji as defined in (122]) and with a 
finite element discretization mesh of width hi. The multilevel FE-discretization of the 
forward problem (Tl2|) and the truncation (122]) induces a corresponding hierarchy of 
approximations p J ' l ' s of the posterior measure p 6 . 

Following [U HH [10] the MLMCMC will be based on sampling a telescopic 
expansion of the discretization error with a level- dependent sample size. 

We continue to work under Assumptions E] [TQl [HJ We recall the sequence of 
discretization levels in the FE discretizations in D in Assumption (TTJ and the gpc input's 
truncation dimension J in Assumption [TOJ We then derive the Multilevel MCMC-FEM 
as follows. First, we note that there exists C > independent of L such that 

\E pS [£(P(u))\ - E ps [£(P jL > L (u))}\ < Csup \\P(u) - P Jl > l {u)\\ v < C2~ L . (47) 
We then write 

L 

E p6 [£{P Jl ' l )] = E pS [£(P J w^ L W)} + ^ p6 [K pJl ' 1 ) ~ ^P Jl - 1 ' l ~ 1 )] ■ (48) 

l=\L/2-]+l 

For I = \L/2] , \L/2] + 1, . . . , L, let 

Q Jl \u) = {O x {P Jl ' l {u)), 2 (P J '\u)), O d (P° h \u))} 

and 

= -g J ^\u)\ 2 . 

We introduce, for each I E N, the Markov chains C\ = (vP^)k which are generated by 
with the acceptance probability a(u, v) in (j6]) being replaced by 

a Jl ' l (u,v) = 1 Aexp($ J '-'(u;<5) - <S> Jl > l (v; 8)) , («, v) £ U x U . (49) 

Then the chains C\ are pairwise uncorrelated. 

For Mi E N, £ £ V* and for a function Q : U — > V, we define the sample average 
with respect to the multilevel approximation of the Markov chain thus defined by 

1 M l 

< w w)4E# w )). 
' fe=i 

We denote by £l = {C|x/2] , C\l/2\+i, ■ ■ ■ , Cl}; and denote by the product probability 
measure on the probability space that describes the law of <L L : 

•— pPi^rVal'TV 2 ! (g, -pP,J[L/2-]+i,\L/2]+l (g, _ _ _ (g, <pp,JL,L _ 

Let (El be the expectation with respect to under the probability We then 

approximate the right hand side of (I48p by 

L 

T := E$ L, ^ Lm '\l{P J W*\J L M)\ + £ ^'^(P^ - P^-'" 1 )] , (50) 

z= 17721+1 

where the number of samples M; and M are to be determined. 



Sparse MCMC gpc Finite Element Methods for Bayesian Inverse Problems 



31 



Remark 31 As shown below, our bound on the error of the approximation of 
E ps [£(P Jl ' 1 ) — £(P Jl - 1 ' l ~ 1 )] by W Jl ' u ~'[£(P J '> 1 ) - l(P Jl -^ 1 - 1 )} involves the term 2~ 21 . Thus, 
to achieve an approximation error ofO(2~ L ) (ie. of the order of the discretization error 
in one instance of the forward problem) in the estimated expectation, we only perform 
the telescoping process from \L/2\. 



6.2. Complexity Analysis of the MLMCMC-FEM 

From Proposition [T5l we deduce that there exists C(8) > such that for all I £ N 

EP 5 [£(P J » l ) - £{P J ^ 1 - 1 )] - E pJ " l ' S [£(P Jl ' 1 ) - £{P Jl -^ 1 )} 

< C{5)(J~ q + 2" 1 ) sup \t(P Jl '\u)) - £{P Jl ^ l ~\u))\ 

ueu 

< C{5){J~ q + 2~ l )2- 1 . 

Following the procedure in the proof of Proposition [16], we have 



(51) 



£ 



p,Ji,i 



(52) 



W [£(P Jl ' 1 ) - £{P J ^ 1 - 1 )} - E p M ' t ' ' [£(P Jl ' 1 ) - £(P- h -^ 1 - 1 )} 
< Mf 1 sup \£(P J " 1 ) - £{P Jl -^~ l )\ 2 

Similarly, we have that 

E pS [£{P J w^ L W)} - W J ^ mALm, \£{P J ^^ L '^)} < C(5)(J r 7 /2l + 2- L/2 )2- L/2 , (53) 
and 

£P,J\L/2],\L/2] 



< CM- 1 . 



E /rv2vrw r ^ Dj 



^ p j [l/2] ,\l/2^ _ E^ mvlLm '\i{P J \mv\ L /^)] 



(54) 



From equations (JUD, (j48]), (J5U), O, ([53]) and ([54)) with the Cauchy Schwartz 
inequality that there exists a constant C > such that, for any L £ N, and for any 
choice {Ji}f =0 and M £ N, there holds 



£ L [\W [£{P)}-T\ 



1/2 



< C2~ L + C(8)(J- q /2] + 2~ L ' 2 )2~ L / 2 + CM- 1 ' 2 



+ CL 1 ' 2 ( J i~ 9 + 2 - ' + M~ l/2 )2~ l 

l=\L/2]+l 

Choosing here J t = [2^], M t = 2 2 ^ l \ J [L/2l = 2 L I^ and M = 2 2L , we then find 
'<S L [\W 5 [£{P)\ -T| 2 ]) V2 < C(5)L 3/2 2~ L . 



As I oo, the number of degrees of freedom used for computing P Jl ' 1 — ph-it- 1 is 
0(2 dl ) for a single sample of u. 

The total number of degrees of freedom for computing E P M \£{P Jul ) — £[P Jl - l,l ~ 1 )\ 
behaves, asymptotically, as / — > oo, as 0(Mi2 dl ). Likewise, the total number of degrees of 



Sparse MCMC gpc Finite Element Methods for Bayesian Inverse Problems 32 

Jr L ; 2 -| ,{L/2],6 

freedom required for computing E P M [i[P J \^m^ L / 2 \)] is 0(M2 dL ' 2 ) as L -> oo. 

The total number of degrees of freedom is therefore not larger than 

L L 

M fi dl + M2 dL/2 < 2 2(L ~' ) 2 d/ + 2 2L 2 dL/2 < 2 {2+d/2)L . (55) 

l=\L/2]+l l=\L/2-]+l 

As in Section HJ to form the stiffness matrix to compute the Finite Element 
approximation P Jl ' 1 of P requires, for each realization of u, not more than 0(2 7 ) J/ = 
O {l d ~~ 1 2 dl+l ' q ) float point operations. The number of floating point operations required 
for the computation of the Finite Element approximation P j [l/2],\l/2] - g no ^. j ar g er 
than 0(L d ~ l 2 lydL l 2+L ^ 2q ^). The total number of floating point operations required is 
asymptotically, as L — > oo, bounded by 

L 

22(L-l)j^d-l2dl+l/q _|_ 2 2L jd-l2{dL/2+L/(2q)) ^ ^dL+L/q + 2( d / 2+1 /(2g)+2)L ^ 

l=\L/2]+l 

Setting = 2 dL , we have thus shown the following result. 

Theorem 32 The expectation E p [£{P)\ can be approximated by Multi-Level MCMC 
FEM based on a continuous, piecewise linear FEM on a family of quasiuniform, 
shape-regular triangulations of meshwidth h in D to a mean-square error 0(h) = 
0(N~ 1 / d (\ogN) 3 / 2 ) using a total of 0(N 1 / 2+2 / d ) degrees of freedom and a total of 
0((logiV) <i - 1 iV max ( 1 / 2+1 /( 2d9 ) +2 / <i ' 1+1 /( d9) ) floating point operations. 

Remark 33 For the gpc-MLMCMC method, when using Nf p floating point operations 
and ignoring logarithmic terms in the cost estimates, the error can be written as 

^-min{l/(d+l/ ? ),l/(d/2+2+l/(2g))^ 

This rate is always superior to the rate N l f ! p {d+2+l/q) for the pi ain MCMC method which 
we found in Section [^} 

Comparing to the sparse gpc-MCMC method in section El assuming that we have 
the optimal rate N^/ d i.e. t — \jd in Assumptionl21\ when a > 1/2 + 2/d (assumming 
that q ~ s — 1 is large), the method presented in this section is superior in terms of 
accuracy versus complexity. 

Although we only presented the MLMCMC approach for the particular elliptic 
problem ( TX2)) with a special distribution (T73p, it is expected that for other distributions of 
the coefficient K , when Assumptionl21\ does not hold, or hold with a constant a that is 
not close to 1, the MLMCMC approach is a superior alternative method than the plain 
MCMC to compute the expectation of a function g(u) = £(P(u)) where i G V* . 



7. Conclusions 

We note that in [22] an entirely deterministic approach to the solution of the Bayesian 
inverse problem is presented. However it is to be expected that the methods presented 
herein will be superior in practice, for some problems, because of the ability of MCMC 
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based methods to sample measures which concentrate on small parts of the space. A 
detailed comparison of the computational performance of the gpc-accelerated MLMCMC 
methods of this paper with the deterministic approach in [22] will provide useful 
information about their relative merits. 

We also observe that we have concentrated on a very special MCMC method, 
namely the independence sampler. This will work well when the negative log likelihood 
$ does not vary too much, but will be inefficient in general. More appropriate MCMC 
methods may be found in [7] . However for these more general methods the analysis of the 
Markov chain based on the methods of [18] are not appropriate and more sophisticated 
arguments are required, as presented in [T2j . 
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